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Screening for Emotional Distress in Cancer Patients: 
A Systematic Review of Assessment Instruments 

Andrea Vodermaier, Wolfgang Linden, Christopher Siu 

Screening for emotional distress is becoming increasingly common in cancer care. This systematic review examines the psy- 
chometric properties of the existing tools used to screen patients for emotional distress, with the goal of encouraging screening 
programs to use standardized tools that have strong psychometrics. Systematic searches of MEDLINE and PsyclNFO databases 
for English-language studies in cancer patients were performed using a uniform set of key words (eg, depression, anxiety, 
screening, validation, and scale), and the retrieved studies were independently evaluated by two reviewers. Evaluation criteria 
included the number of validation studies, the number of participants, generalizability, reliability, the quality of the criterion 
measure, sensitivity, and specificity. The literature search yielded 106 validation studies that described a total of 33 screening 
measures. Many generic and cancer-specific scales satisfied a fairly high threshold of quality in terms of their psychometric 
properties and generalizability. Among the ultrashort measures (ie, those containing one to four items), the Combined Depression 
Questions performed best in patients receiving palliative care. Among the short measures (ie, those containing five to 20 items), 
the Center for Epidemiologic Studies-Depression Scale and the Hospital Anxiety and Depression Scale demonstrated adequate 
psychometric properties. Among the long measures (ie, those containing 21-50 items), the Beck Depression Inventory and the 
General Health Questionaire-28 met all evaluation criteria. The Psychosocial Screen for Cancer, the Questionnaire on Stress in 
Cancer Patients-Revised, and the Rotterdam Symptom Checklist are long measures that can also be recommended for routine 
screening. In addition, other measures may be considered for specific indications or disease types. Some measures, particularly 
newly developed cancer-specific scales, require further validation against structured clinical interviews (the criterion standard 
for validation measures) before they can be recommended. 
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Transient mood disturbances occur frequently among cancer 
patients during the disease trajectory, and depression often persists 
in these patients (1). Consequently, psychosocial counseling has 
become an integral part of cancer care, and several meta-analyses 
support its efficacy (2-4). More specifically, behavioral interven- 
tions (5-7) and supportive-expressive group therapy (8,9) are ef- 
fective in reducing emotional distress in cancer patients. These 
treatments work best for patients with pronounced clinical symp- 
toms of emotional distress (10). To maximize the use of limited 
treatment resources and provide equitable access to mental health 
services, emotionally distressed cancer patients need to be reliably 
identified. Traditionally, referrals for mental health services are 
either self-initiated or based on physician judgment. However, the 
concordance rates between patients' self-report and physicians' 
clinical impressions are low, thus identifying a need for standard- 
ized validated tools for measuring emotional distress (11,12). 
Given that the so-called criterion standard clinical assessment 
interviews for emotional distress - either standardized (eg, the 
Composite International Diagnostic Interview (GDI) for DSM 
IV Axis I Disorders) or structured (eg, the Structured Clinical 
Interview for DSM IV Axis I Disorders (SCID-I)) — are time con- 
suming for both the patient and the clinical staff who administer 
them and are, therefore, costly, their routine implementation in 
busy clinics is unlikely. Furthermore, patients who are receiving 
palliative care may not be physically able to complete lengthy diag- 
nostic interviews. Thus, relatively brief but validated question- 



naires would seem to be the tools of choice for routine screening 
of cancer patients' emotional distress. Brief self-reports are easy to 
administer, inexpensive (some are even free), and, if properly vali- 
dated, can help identify those patients most in need of professional 
mental health support. 

A distinct advantage of systematic screening of cancer patients 
for emotional distress is that it is likely to promote equal access to 
psychological services, whereas a system that is based only on 
physician- or patient-initiated referrals might fail to identify and/ 
or overlook a substantial proportion of emotionally distressed 
patients who are in need of supportive treatment. Furthermore, 
systematic screening allows mental health staff to forecast their 
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workload. To date, however, only a minority of cancer centers in 
the United States (13), the United Kingdom (14), and Canada (15) 
have implemented emotional distress screening of patients with 
standardized tools. Time constraints of health professionals and 
insufficient knowledge about the appropriate screening tool may 
partially account for the infrequent use of high-quality screening 
instruments in cancer care settings. The widely acknowledged 
shortage of professional staff for treatment follow-through sug- 
gests a need for screening tools with high sensitivity and high 
specificity that ensure that all patients in need of psychological 
support are identified. We posit that the choice of a screening tool 
ought to consider the psychometric properties of the instrument, 
with special emphasis on its sensitivity and specificity, the treatment 
environment, and the patient's disease stage. 

Psychological measures (which in this review are referred to as 
scales, tools, instruments, and measures) come in varying lengths 
and formats. One important distinguishing feature for various 
scales is their length, which is defined by the number of questions 
or test items they contain; the term "screening tool" usually refers 
to particularly short tests. Longer tests cost more money to admin- 
ister but are sometimes needed to reach acceptable levels of reli- 
ability and validity. The advantages and disadvantages of screening 
tools of varying lengths are summarized in Table 1 . We created the 
following length categories according to the number of items in a 
measure: ultrashort (one to four items), short (five to 20 items), and 
long (2 1-50 items); these cut points were chosen arbitrarily before 
data extraction and review. Ultrashort measures are typically 
limited to one psychological domain, such as depression or anxiety, 
and are the easiest to implement in routine care settings; however, 
they may not be appropriate for use in research settings. Their 
brevity presents a potential economic advantage because fewer staff 
resources are required for their administration and scoring. As one 
meta-analysis (16) demonstrated, ultrashort screening tools can 
possess adequate sensitivity to identify distressed patients but lack 
the specificity to rule out those patients who were wrongly identi- 
fied as distressed (ie, false positives). Test instruments that contain 
more than four items can assess more aspects of emotional impair- 
ment and may possess superior psychometric properties. The trade- 
off is that routine use of longer tools, particularly their scoring and 
interpretation, places more of a burden on staff time. However, 
the availability of touch screen computer-based assessments can 
eliminate this disadvantage because the computer program can 
automatically score the assessment tool and generate a report. 

Included in this review are both newly developed and well-estab- 
lished distress screening tools that have been validated in patients 
with cancer. To the best of our knowledge, this systematic review is 
the most comprehensive review of screening instruments for emo- 



tional distress in cancer patients to date. In this review, we define 
distress as a state of negative affect that is suggestive of affective dis- 
orders (ie, minor or major depressive disorder and dysthymia), anxiety 
disorders, and adjustment disorders (depressive, anxious, or mixed). 
Measures of related domains (eg, physical symptom distress, lack of 
social support, quality of life, and patient needs) were excluded. 

Methods 

Study Selection 

The data extraction and study review process were performed 
according to the guidelines for systematic reviews of diagnostic 
tests in cancer (17). We searched MEDLINE (1966 to August 
2008) and PsycINFO (1872 to August 2008) databases for English- 
language studies in cancer patients by using the following search 
terms (cancer OR screening OR instrument OR measure OR 
questionnaire OR validation) AND (distress OR depression OR 
anxiety OR adjustment disorder OR negative affect OR psycho- 
logical). After eliminating the duplicate studies, the titles and 
abstracts of the remaining studies were reviewed independently 
by two authors (A. Vodermaier and C. Siu) (Figure 1). These 
authors also reviewed the full-length article for all studies that 
were retained, and their interrater reliability was calculated. 
Interrater reliability was computed as a kappa coefficient (k = .86). 
Disagreements about whether or not studies met the inclusion 
criteria were resolved by seeking additional input from the second 
author (W. Linden). The first author (A. Vodermaier) performed 
a detailed assessment of the included studies and identified addi- 
tional validation studies via cross-referencing. 

Study Inclusion and Evaluation Criteria 

A study was included in this review if it attempted to validate a 
newly developed cancer-specific questionnaire (either interviewer 
administered or standardized self-administered) or reported on an 
existing generic measure that had also been validated in a sample 
of cancer patients. The measure could not exceed 50 items and 
must have been published in a peer-reviewed English-language 
journal. We focused on published peer-reviewed studies because 
we expected them to be the most methodologically rigorous, thus 
yielding the strongest conclusions with regard to recommenda- 
tions about tool choice. 

Studies included in this review were evaluated on the basis of 
the following criteria: the number of validation studies identified, 
the number of participants across studies, generalizability across 
cancer types and/or disease stages, reliability, type of the criterion 
measure (in which structured clinical interviews such as Composite 
International Diagnostic Interview or SCID represent the criterion 



Table 1. Advantages and disadvantages of screening tools of varying length 



Ultrashort (1-4 items) 

Excellent chance for adoption in busy clinics 
Sensitivity may be high, low-to-moderate 

specificity 
Can only assess one domain 
Not suitable for research 
Inexpensive 



Short (5-20 items) 

Moderate chance for adoption in busy clinics 
Likely high sensitivity, moderate-to-high specificity 

Can assess multiple domains 
May be suitable for research, needs to be tested 
Some cost in scoring (can be minimized 
via automation) 



Long (21-50 items) 

Routine use unlikely unless automated 
Specificity and sensitivity can be high 

Can assess multiple domains 
Excellent for research 

Potentially costly scoring (can be minimized 
via automation) 
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(n = 67) 



i i 

Studies included in systematic review (n = 106) 



Figure 1. Flowchart of studies included in systematic review. 

standard), and validity. When information on sensitivity, speci- 
ficity, positive predictive value, or negative predictive value was 
partly missing but could be computed on the basis of other data 
presented, we completed these computations. 

Reliability 

Using the recommendations of statistical experts (18), we required 
an internal consistency of .8 or higher for a screening instrument 
to warrant a designation of high quality. Internal consistency was 
usually reported in the included studies as the Cronbach alpha 
estimate (19) or as the Spearman-Brown rho coefficient (19). 
Reliability was available for the generic scales included in this sys- 
tematic review. Therefore, internal consistency should be reported 
for newly developed cancer-specific scales for which psychometric 
properties have not been yet established in order to achieve an 
evaluation of adequate reliability as a screening tool. Unless sub- 
scale reliabilities were specifically reported, the Cronbach alpha or 
Spearman-Brown rho represents the internal consistency for the 
entire scale. Test-retest reliability was considered less important as 
another index of reliability than the scale's internal consistency 
because mood in cancer patients is known to be unstable and a 
function of where the patient is in the illness trajectory (20). 
Information on test-retest reliability or sensitivity to change was 
included in the description of studies when it was available. 



Validity 

We assumed that the typical screening measures included in this 
systematic review already have face and content validity. Therefore, 
this review focused on information about concurrent, construct, 
and discriminant validity. Concurrent validity is a test's ability to 
measure similar phenomena as do other tests for the same target 
variable, for example, other anxiety tests. Construct validity seeks 
agreement between a theoretical concept and a specific measure. 
For example, a researcher developing a depression scale will first 
make a concerted effort to define depression so that the new test 
actually captures the target variable of depression. Regarding 
quality of validation, we posit that the most important criterion is 
whether or not a screening tool has empirically validated cutoffs 
based on clearly identified sensitivity and specificity data. Hence, 
in this review, we placed the greatest emphasis on the results of 
receiver operating characteristic (ROC) analyses that provided 
empirically justified cutoffs for clinical decision making (ie, dis- 
criminant validity). The ROC curve is a graphical plot of the sen- 
sitivity vs 1 minus the specificity that provides information needed 
for choosing a useful cutoff. For this review, a tool was considered 
to have high validity if the average of its sensitivity and specificity 
estimates was .80 or higher. We searched for evidence of predic- 
tive validity in particular but could not find any study that was 
suitable to be included in the review. 
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Overall Judgment 

Our evaluation of the validation studies used decision rules that are 
summarized in Table 2 . The results of individual studies were av- 
eraged across each single measure such that the number of partic- 
ipants was weighted across studies within each measure to assess 
overall reliability, type of criterion measure, and validity. Reliability, 
type of criterion measure, and validity were rated as high, mod- 
erate, or low. These three ratings were condensed into a five-level 
overall judgment (excellent, good, moderate, fair, or poor) accord- 
ing to the decision rules described in Table 2. The overall judg- 
ment was "poor" if any of the three criteria was rated as low, 
reliability was not reported, no ROC analysis data were available, 
or the number of participants in a validation attempt was below a 
threshold of 100 when self-report scales were used as the criterion 
measure or below 50 when structured clinical interviews were 
used. Given that generalizability is not of general importance for 
screening tool choice, this criterion did not influence the overall 
judgment. 

Results 

The literature search identified 2747 publications. A total of 1416 
studies remained after duplicate studies were removed. The 
decision steps are detailed in Figure 1. Data extraction and addi- 
tional articles found via checks of cross-references resulted in 106 
validation studies that described a total of 33 measures. 

Table 3 provides the summary judgments for the screening 
tools based on the predefined evaluation criteria. The key data for 
each study were extracted and are presented in Tables 4, 5, and 6. 
Table 4 presents the validation studies on questionnaires that con- 
tain one to four items, Table 5 describes those containing five to 
20 items, and Table 6 covers those with 2 1-50 items. When a non- 
English-speaking country is noted in the "sample" column, it 
refers to a version of the scale that was translated according to 
standard forward and backward translation procedures [except in 
one study (125), where this procedure was not used]. The Brief 
Symptom Inventory-18 (BSI-18) (127), the BSI-53 Global Severity 
Index (128), the Center for Epidemiological Studies-Depression 
Scale (CES-D) (129), the General Health Questionnaire-12 
(GHQ-12) (130), the Hospital Anxiety and Depression Scale 
(HADS) (131), the Patient Health Questionnaire-9 (PHQ-9) 
(132), the Symptom Checklist-90-Revised (133), and the State- 
Trait Anxiety Inventory-trait version (134) were used as criterion 
measures; the BSI-18, CES-D, and HADS were also used as 
screening tools. 

Ultrashort Measures 

A total of 29 studies examined the use of ultrashort screening in- 
struments (Table 4). The majority of these ultrashort measures 
were validated for use in patients with advanced cancer. 

The single-item question "Are you anxious?" (21) was studied 
as a screening tool for emotional distress in palliative care patients 
and showed insufficient specificity to rule out nonanxious patients. 
The anxiety subscale of the HADS (131) was used as the criterion 
measure. 

The Brief Case Find Depression is a four-item scale that was 
validated against the Primary Care Evaluation of Mental Disorders 



Table 2. Decision rule 
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High 
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High 


Moderate 


Moderate 


Moderate 


Moderate 


High 


Moderate 




Moderate 


Moderate 


Moderate 


Fair 


Low or not reported* 


Low 


Low 


Poort 



Construct validity data only 

No. of participants across studies*: n < 100 or n < 50 

* For established scales, reliability did not necessarily have to be reported. 
Reliability also was not applicable for one- or two-item scales. 

t If one or more criteria were rated low, the overall judgment was "poor." 

t n < 100 when a scale was used as the criterion measure, and n < 50 when a 
structured clinical interview was used as the criterion measure. 

in a small sample of cancer patients (22). Its interrater reliability 
was low. The measure had moderate specificity and performed 
worse than the HADS and the Beck Depression Inventory (BDI) 
(135) in ruling out nondepressed patients. 

Lengthy questionnaires may be especially burdensome for 
patients in palliative care. For this reason, several studies have 
tested single questions from diagnostic interviews against struc- 
tured clinical interviews as a screening method for depressive 
disorders in palliative care patients. Altogether, seven studies 
(21,23-28) examined the psychometric properties of single 
screening questions; four of these studies (23,24,26,27) tested the 
single question against a structured clinical assessment of the diag- 
nosis. Three studies (23,26,28) also examined the combination of 
the two screening questions (hereafter referred to as the combina- 
tion depression questions) that represent the first and second 
diagnostic criteria for a depressive disorder. The first criterion — 
"Are you depressed?" — yielded perfect sensitivity and specificity to 
detect any kind of depressive disorder and outperformed the BDI 
and the visual analog scales in one study (23). However, in several 
other studies, it had low sensitivity to detect any affective disorder 
(24-27), whereas its sensitivity to detect a major depressive disor- 
der was high across all studies. The second diagnostic criterion for 
a depressive disorder — "Have you lost interest?" — showed the 
same pattern as the first diagnostic criterion question (26) in that 
it was much less sensitive in detecting minor disorders, such as 
adjustment disorder, than in detecting major depressive disorder 
(26). The combined screening questions did not increase the spec- 
ificity compared with each individual question but increased the 
sensitivity (26,28). 

An alternative screening tool, the one-question interview, was 
developed by Akizuki et al. (29) and asked patients to "Please grade 
your mood during the past week by assigning it a score from 0 to 
100, with a score of 100 representing your usual relaxed mood. A 
score of 60 is considered the passing grade." The measure had 
comparable psychometric properties to the HADS (131) and the 
National Comprehensive Cancer Network Distress Thermometer 
(DT) (136), but its criterion validity was low. 

The National Comprehensive Cancer Network DT was intro- 
duced more than a decade ago (136) and measures overall emotional 
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distress with one item on an 1 1 -point rating scale (from 0 = no dis- 
tress to 10 = extreme distress). Although domain-specific distress can 
be measured with a complementary problem list that asks whether 
problems exist in practical, familial, emotional, physical, or spiritual 
domains, most of the studies included in this review provided psy- 
chometric information only on the DT itself. Altogether, 1 5 valida- 
tion studies (29-43) examined the DT. Of these, eight studies 
(33,34,36,38^-0,42,43) used the HADS as the criterion measure, 
four studies (31,35,37,41) used exclusively other distress or depres- 
sion scales, and two studies (29,32) relied on clinical diagnosis to 
assess the validity of the DT. The DT scale was tested in popula- 
tions of cancer patients with mixed diagnoses and disease stages, 
breast cancer patients, and patients awaiting bone marrow 
transplantation. 

Two studies (31,43) provided information on the internal con- 
sistency of the problem list. Its overall reliability was good but was 
insufficient for some of the subscales. Sensitivity to change has 
been shown in one study (40): changes in DT scores at 4 and 8 
weeks were comparable to changes in the criterion measures' 
scores. However, the interrater reliability, that is, the congruence 
of patient self-report compared with nurses' judgments, tested in 
one study (29), was moderate. Nurses seemed to underestimate the 
actual distress of the patients (29). Taken together, criterion 
measures were weak to moderate, and most studies demonstrated 
moderate specificity for the DT. 

The optimal cutoff for identifying clinically significant distress in 
most studies was defined as 4 or 5, depending on the diagnostic cri- 
teria or the validation measures used. Compared with nondistressed 
patients, distressed patients reported more problems on the problem 
list (34,35), had lower Eastern Cooperative Oncology Group perfor- 
mance status (34,35), and were more likely to be female (34,38). 

Several modifications and extensions of the DT have been devel- 
oped, including two-item screening tools that combine the DT with 
an impact thermometer, which asks patients about the impact of 
distress on their daily life activity (32), and with a mood thermom- 
eter (33). Both alternatives have been tested in comparison with the 
DT and demonstrate better psychometric properties. 

Two studies (21,44) examined subscales of the Edmonton 
Symptom Assessment System (137) that measure anxiety and de- 
pressive symptoms in comparison with the HADS. The Edmonton 
Symptom Assessment System was developed to assess symptom 
distress in palliative care patients. The scale demonstrated mod- 
erate validity as a screening tool for emotional distress in palliative 
care patients. 

Six studies (23,45-49) examined the validity of visual analog 
scales that were derived from the Memorial Pain Assessment Card 
mood subscale (138) as screening tools for emotional distress in 
various populations of cancer patients. One study (46) reported a 
moderate correlation between patients' self-reported distress and 
the distress levels rated by their physicians. Another study (49) that 
compared several screening instruments with structured clinical 
interviews provided evidence that visual analog scales performed 
worse than other screening measures. 

Short Measures 

Most of the screening measures that have been validated for use in 
cancer patients have between five and 20 items. Altogether, 72 
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studies described 15 screening instruments of this length 
(Table 5). 

The BDI-Short Form is a widely used depression scale that 
consists of 13 items (139). Two studies (23,50) examined the psy- 
chometric properties of this scale in populations of patients with 
advanced cancer. The BDI-Short Form demonstrated low inter- 
rater reliability and moderate specificity. 

The BSI-18 is a self-report scale that was designed to assess 
clinically relevant psychological symptoms (127). The scale was 
tested against its two long forms, the BSI-53 Global Severity Index 
(128) and the Symptom Checklist-90-Revised (133). With these 
criterion measures, the BSI-18 demonstrated excellent reliability 
and validity in a large mixed sample of cancer patients with a sen- 
sitivity and specificity of .91 and .93, respectively (52), and in adult 
survivors of childhood cancer with a sensitivity and specificity of .97 
and .85, respectively (53). Internal consistency was high for the 
anxiety and depression subscales (54,55). Results of a factor analysis 
confirmed the scale's three-factor structure (ie, depression, anxiety, 
and somatization) (55). 

The CES-D (129) is a 20-item depression measure that has 
been validated in mixed samples of cancer patients and reference 
groups of healthy control subjects (56-59). Results from factor 
analyses (57) suggested that the negative affect subscale of the 
CES-D was a better measure of depression than the CES-D total 
score. The CES-D demonstrated good internal consistency 
(56,57,59). Two studies (58,59) provided information on the 
scale's sensitivity and specificity and revealed that it has very good 
psychometric properties. 

The Edinburgh Postnatal Depression Scale, a 10-item scale 
that was initially developed to screen for postpartum depression in 
new mothers, measures guilt, worthlessness, and hopelessness 
(140), which are symptoms that may also discriminate between 
depressed and nondepressed patients with advanced cancer. This 
scale was examined as a screening tool for depression in patients 
with advanced cancer (25,51,60,61) and tested against a structured 
clinical interview as the criterion (25,51,60). The sensitivity and 
specificity of the Edinburgh Postnatal Depression Scale were ade- 
quate, and it performed better than the HADS in this population. 
The Edinburgh Postnatal Depression Scale also demonstrated 
good internal consistency and interrater reliability (25,60,61). A 
short form of the EDPS, the six-item Brief Edinburgh Depression 
Scale had psychometric properties that were comparable to those 
of the original scale (51). 

The GHQ-12 (130) was tested as a screening tool for psycho- 
logical distress in two studies (62,63) and compared with the 
HADS. Both studies demonstrated that the psychometric prop- 
erties of the GHQ-12 were adequate but inferior to those of the 
HADS in samples of patients with advanced cancer. 

The HADS is a 14-item questionnaire that assesses anxiety 
and depressive symptoms in medical settings (131). A total of 41 
of the identified validation studies of screening tools used to 
detect psychological distress in cancer patients were conducted 
on the HADS or compared its psychometric properties with 
other scales (22,26,29,32,33,46,49,50,58,62-93). Ten studies 
(33,66,67,73,79,81,86-89) tested whether or not the known two- 
factor structure of the HADS (which corresponds to the anxiety 
and depression subscales of the questionnaire) could be replicated 



in samples of cancer patients. Most of those studies (3 3 ,66,73,8 1 ,87- 
89) did replicate the two-factor structure of the HADS in cancer 
patients. Two studies (67,86) yielded a three-factor solution and 
one study (78) a four-factor solution. Smith et al. (81) demon- 
strated in a very large sample of cancer patients that the two- 
factor structure was stable across subsamples that were stratified 
by age, sex, and disease stage. The internal consistency of each 
subscale and of the total scale were shown to be adequate 
(33,66,67,72,73,78,82,86,88) and sensitive to change (72) in cancer 
patients. 

Twenty-six studies (22,26,49,50,58,62-65,68-70,72-81, 
83,89,91,93) examined the discriminant validity of the HADS by 
comparing it with structured clinical assessments such as the SCID, 
Present State Examination, Clinical Interview Schedule, Clinical 
Interview Schedule-Revised, Psychiatric Assessment Schedule, 
Monash Interview for Liaison Psychiatry, Schedule for Affective 
Disorders and Schizophrenia, Schedule for Clinical Assessment in 
Neuropsychiatry, Composite International Diagnostic Interview, 
Primary Care Evaluation of Mental Disorders, and the Diagnostic 
Interview Schedule. Ten studies (49,58,62,65,70,72,73,75,83,91) 
showed that the screening performance of the HADS was high, 14 
studies (22,26,63,64,68,69,74,76-79,81,89,93) showed moderate 
performance, and two studies (50,80) reported low screening 
performance. 

One study (69) reported that the HADS performed better in 
patients who were disease free or who had stable disease than in 
patients in acute treatment or with advanced disease. The HADS 
failed as a screening instrument in patients newly diagnosed with 
breast cancer (80). 

In some studies (65,74,93), the anxiety subscale of the HADS 
performed better than the depression subscale. Other studies dem- 
onstrated that the HADS total score had psychometric properties 
that were comparable (65) or superior (49) to those of the anxiety 
or depression subscales. We were disconcerted to find that cutoffs 
for distinguishing anxious or depressed patients from nonanxious 
or nondepressed patients differed widely across studies and that 
this variability had not been justified. The cutoffs for the HADS 
total score ranged from 8 to 22 and for the subscale scores from 5 
to 11. 

The Hornheide Questionnaire Short Form is a nine-item ques- 
tionnaire that was validated in 122 German patients with head and 
skin cancer following surgery and had high internal consistency 
(a = .81) (141). One study (49) compared different screening 
measures in a sample of German patients with laryngeal cancer and 
found that the psychometric properties of the Hornheide 
Questionnaire Short Form and of the other instruments were 
inferior to those of the HADS. 

The Impact of Event Scale was originally developed as an in- 
strument to measure posttraumatic stress and is a 15 -item scale 
that is widely used to assess emotional distress in cancer patients 
(142). One study (94) examined the discriminant validity of the 
Impact of Event Scale to detect adjustment disorder in patients 
undergoing bone marrow transplantation and found that this scale 
had inadequate specificity for use as a screening tool in this popula- 
tion. Other studies (95,96) did not provide further evidence for 
recommending the Impact of Event Scale as a distress screening 
tool in cancer patients. 
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The Memorial Anxiety Scale for Prostate Cancer is an 18-item 
scale that was developed for use in prostate cancer patients and 
consists of three subscales: prostate cancer anxiety, prostate-spe- 
cific antigen anxiety, and fear of recurrence (97). Except for the 
prostate-specific antigen anxiety subscale, the Memorial Anxiety 
Scale for Prostate Cancer has good internal consistency. Preliminary 
results of the scale's validity have been reported (97,98), but clin- 
ical cutoffs have yet to be established. The Memorial Anxiety Scale 
for Prostate Cancer was also validated for use in men undergoing 
prostate biopsy (99). 

The Psychological Distress Inventory (100) is a 13-item scale 
that was developed to measure distress in Italian breast cancer 
patients. Its reliability and validity indices are good (79,100). The 
discriminant validity of the Psychological Distress Inventory was 
tested against a structured clinical interview as the criterion, and 
cutoffs of 28 (79) and 29 (100) have been considered clinically sig- 
nificant. However, its use is limited to Italian-speaking patients. 

The Patient Health Questionnaire-9 (PHQ-9) measures 
depressive symptoms according to Diagnostic and Statistical Manual 
of Mental Disorders— Fourth Edition (143) criteria. The PHQ-9 was 
validated in a large sample of primary care and obstetrics and gyne- 
cology patients and found to have strong psychometric properties 
(132). The PHQ-9 also demonstrated adequate reliability as well as 
concurrent and divergent validity in a small study of head and neck 
cancer patients (101) and in a study that used a touch screen com- 
puterized version of the questionnaire (102). However, information 
on the scale's sensitivity and specificity with regard to clinical 
decision making in cancer patients is lacking. 

The Post Traumatic Stress Disorder Checklist-Civilian Version 
(144) was tested as a measure of posttraumatic stress in breast can- 
cer patients (103) and in survivors of bone marrow transplantation 
(104,105). The latter two studies (104,105) examined the scale's 
construct validity and demonstrated that it had high reliability and 
a four-factor structure. In a sample of breast cancer patients, the 
measure showed moderate sensitivity but high specificity to detect 
posttraumatic stress disorder (103). 

One study evaluated the Profile of Mood States-Linear Analog 
Self-Assessment (106) as a screening instrument in cancer patients 
with mixed diagnoses and for patients with different stages of 
disease and compared it with the original Profile of Mood States 
and the Symptom Checklist-90-Revised. The measure demon- 
strated sensitivity to change and concurrent validity. However, not 
enough data are available on its psychometric properties to recom- 
mend its use in clinical decision making. 

The Zung Self-Rating Depression Scale is a 20-item questionnaire 
that evaluates depression (145). Six studies (94,107-111) reported in- 
formation on the scale's psychometric properties in cancer patients. A 
13-item short form of this scale is highly correlated (r = .92) with the 
long form (110). Although the scale (long form) had high reliability, it 
demonstrated low concordance rates with physician ratings of depres- 
sion (107) and moderate validity (94,110) when used for cancer 
patients. Also, the short-form scale was found to have inadequate 
sensitivity compared with the long-form scale (1 10). 

Long Measures 

Nine scales, each with more than 20 items, were identified for 
screening cancer patients (Table 6). 



One small study of cancer patients (59) examined the psycho- 
metric properties of the Beck Anxiety Inventory (146). The study 
provided evidence that the Beck Anxiety Inventory can be a valid 
measure to screen cancer patients for emotional distress, but there 
is not enough validation information available to justify a recom- 
mendation at this time. 

Five studies (22,58,59,71,112) examined the psychometric 
properties of the 21-item BDI (135), and all but one (112) provided 
data from ROC analyses. One study (22) showed that the scale 
possessed low sensitivity, whereas the other studies demonstrated 
that it had excellent sensitivity and specificity to detect any depres- 
sive disorder. 

The Distress Inventory for Cancer (113) was developed for use 
in head and neck cancer patients. To our knowledge, the only in- 
formation available to date is on the scale's construct validity, and 
more studies on the scale's discriminant validity are necessary 
before a recommendation is possible. 

The GHQ-28 (130) was tested as a screening tool for psycho- 
logical distress in two studies (69,114), where it demonstrated high 
sensitivity and specificity to detect cancer patients with psychiatric 
symptoms. 

The Mood Evaluation Questionnaire (147) is a 23-item 
measure that demonstrated excellent internal consistency but 
only moderate agreement with SCID interview data (115). Its 
discriminant validity was adequate (115). The Mood Evaluation 
Questionnaire has been used for repeated assessments in patients 
with advanced cancer (116). 

One study (117) provided information about the construct 
validity of the Profile of Mood Scale-Short Form (148) for patients 
awaiting bone marrow transplantation. A factor analysis identified 
six factors that provided evidence for construct validity. The inter- 
nal consistency of the subscales was high, with Cronbach alphas 
that ranged from .78 to .90 (117). To date, there is insufficient 
information on this scale's validity to make recommendations for 
its implementation in routine screening. 

The 21-item Psychosocial Screen for Cancer was developed in 
mixed samples of cancer patients, and its psychometrics are good 
(118,119). The scale assesses six domains: depressive symptoms, 
anxiety symptoms, quality of life (global), quality of life (number of 
days impaired), perceived social support, and social support 
desired. The anxiety and depression subscales were highly sensitive 
and specific when compared with the HADS. In addition, norma- 
tive data exist that compare different samples of cancer patients 
with healthy control subjects and with a control group of persons 
with a chronic disease other than cancer (119). Specificity data 
suggest the use of a cutoff of 1 1 for screening of an anxiety or de- 
pressive disorder and a cutoff of 8 for screening of anxiety and 
depressive symptoms. 

The Questionnaire on Stress in Cancer Patients-Revised is a 
23-item validated scale that was developed in a large sample of 
German patients with diverse cancer diagnoses (120,121). The 
Questionnaire on Stress in Cancer Patients-Revised consists of five 
subscales that measure psychosomatic symptoms, anxiety, informa- 
tion gaps, impairments in everyday life, and social distress. The 
Questionnaire on Stress in Cancer Patients-Revised is highly sensi- 
tive and moderately specific in detecting anxiety and depressive 
symptoms compared with the HADS. However, its use is limited to 



1482 Review | JNCI 



Vol. 101, Issue 21 | November 4, 2009 



German-speaking patients because to our knowledge, no psycho- 
metric information exists on its translation into English (12 1). 

The Rotterdam Symptom Checklist (RSCL) is a 30-item ques- 
tionnaire that has been used extensively in clinical trials (122). 
Although some studies showed a four- (122,123) or five-factor 
structure of this scale (124), a two-factor psychological and com- 
posite somatic structure has also been suggested (122-126). The 
psychological subscale demonstrated stability across subsamples as 
well as high internal consistency (122,124). Three studies provided 
information from ROC analyses: Two studies (65,69) reported 
that the RSCL had moderate psychometric properties for use as a 
screening tool, and one (74) found that the RSCL failed as a 
screening tool because of its low sensitivity. The RSCL was supe- 
rior to the HADS in two studies (65,69) for samples of patients 
with progressive disease. Three studies reported on the psycho- 
metric properties of non-English [French (126), Italian (123), and 
Spanish (125)] versions of the questionnaire and showed results 
congruent with the original report, thus providing evidence for 
its use in cross-cultural settings. One study (149) reported only 
on an extension of the physical symptom scales of the RSCL and, 
therefore, was not included in this systematic review. 

Discussion 

We have provided extensive details on tool psychometrics, as well 
as details on types of tools and extent of validation, to guide clini- 
cians' own choice of an assessment instrument for routine emo- 
tional distress screening. Making recommendations about which 
screening tools should be used depends on the context in which 
tools are going to be implemented and the intended objectives that 
may vary across settings and users. The following recommenda- 
tions were based on composite quality criteria that we defined 
using transparent decision rules (Table 2). 

Among ultrashort measures, the two-item combination depres- 
sion questions had the best psychometric properties. The widely used 
DT had been subjected to the most validation studies on the largest 
patient samples but was not validated against a structured clinical 
interview with established sufficient psychometrics. For the DT, the 
sensitivity and specificity findings were lower than 80% in about half 
and two-thirds, respectively, of the validation studies. However, some 
evidence suggests that modifications of the DT, such as the Mood 
Thermometer (33), or expansions, such as the Impact Thermometer 
(32), may represent improvements over the original scale. 

Our findings regarding ultrashort measures differ in part from 
the results of other meta-analyses and reviews on screening tool 
validity. Meta-analyses (16,150) as well as studies in primary care 
(151,152) have demonstrated a lack of specificity in ultrashort 
measures (including the DT) for identifying depression. However, 
our results reveal that this criticism does not apply to the combi- 
nation depression questions as these were found to demonstrate 
high specificity. 

When it comes to ultrashort measures, patients have reported 
that a single-item interview format did not accurately describe or 
capture their mood (38,1 16). In line with these findings, Ohno et al. 
(153) reported that 65% of patients responded to the question "Are 
you depressed or not?" with "neither," which indicates their uncer- 
tainty when rating emotional distress with such a simple question, 



even though their HADS scores suggested that they had clinical 
depression. Furthermore, agreement between ultrashort and longer 
measures in identifying distressed patients detected by structured 
clinical interviews was poor (115). Problems with determining the 
face validity of single-item measures as well as patients' difficulty 
with scaling on single-item screening tools could explain these dis- 
crepant findings. Consequently, further comparison studies investi- 
gating tools of different lengths should be conducted. 

Among the short measures, we can recommend the CES-D as 
a screening tool for depression because it met all criteria for 
quality. The most extensive validation existed for the HADS, and 
this was the case across disease types and stages as well as across 
languages and cultures. The scale has been extensively tested 
against criterion standards. 

Note that many other tools relied on the HADS for discrimi- 
nant validation. Studies that compared the discriminant validity of 
the HADS against other scales found that the HADS was superior 
(26,49,58,62,63) or equivalent (65,69) to other measures. With 
regard to whether or not to use the total score or the subscale 
scores of the HADS, several studies showed that the total score was 
superior in nonpsychiatric patients (49,65,154). 

The BSI-18 and the GHQ-12 are short measures that also 
demonstrated good psychometric properties. Nevertheless, ROC 
analyses of the BSI-18 were based on comparisons of short form 
with the long form of the same instrument and do not, therefore, 
represent independent validation (52,55). In addition, the GHQ-12 
consistently performed worse than the HADS (62,63). Nonetheless, 
both scales have also been used as criterion measures in validation 
attempts of other scales. 

The Post Traumatic Stress Disorder Checklist-Civilian 
Version, the Psychological Distress Inventory, and the Hornheide 
Questionnaire Short Form are short measures that demonstrated 
adequate psychometric properties. However, their use to date is 
limited to specific cancer types or language applications. For 
patients receiving palliative care, the Edinburgh Postnatal 
Depression Scale or its short form, the six-item Brief Edinburgh 
Depression Scale, demonstrated adequate psychometric properties. 
Because of the strong psychometric properties of the PHQ-9 in 
large samples of primary care and obstetrics and gynecology patients 
(132), this scale deserves further empirical evaluation of its value 
for distress screening of cancer patients. 

Among the long measures, the BDI and the GHQ-28 met all 
quality criteria. The Psychosocial Screen for Cancer has not been 
validated against a structured clinical interview but otherwise met 
all criteria. In addition, the Psychosocial Screen for Cancer pro- 
vides information on the social support that a patient desired and 
actually received, which may also guide decision making in psycho- 
oncological follow-up. The Questionnaire on Stress in Cancer 
Patients-Revised was validated in a large sample of cancer patients 
and provided good psychometric properties. The existing English 
version of the scale, therefore, deserves recommendation as a 
screening tool for emotional distress in cancer patients. Finally, the 
RSCL is a long measure that demonstrated adequate psychometric 
properties for distress screening. 

Cancer-specific tools may provide more relevant information 
than generic scales on patients with a specific type of cancer; how- 
ever, some of these tools, such as the Memorial Anxiety Scale for 
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Prostate Cancer (97), require additional validation. Furthermore, 
the routine use of cancer-specific tools is particularly likely to be 
implemented in specialized centers such as those that treat breast 
or prostate cancer patients. Facilities that treat patients with a 
broader disease spectrum may benefit most from a screening tool 
that can be applied to a mixed patient population, such as well- 
established scales including the BDI, the CES-D, the GHQ-28, or 
the HADS. Furthermore, the use of a scale that assesses anxiety as 
well as depressive symptoms, such as the BSI-18, GHQ-28, the 
HADS, the Psychosocial Screen for Cancer, or the RSCL, may 
prevent anxiety disorders from being overlooked within a routine 
screening program. 

We argue that, depending on the physical condition of the 
patients and the treatment setting, relatively short tools should be 
used for the screening of palliative care patients or patients who are 
undergoing strenuous treatment. Furthermore, the use of shorter 
tools for routine screening in an inpatient setting is easier to justify 
and implement. By contrast, patients who have completed treat- 
ment, have follow-up appointments, or are attending rehabilitative 
care may have more physical resources (eg, compared with patients 
under chemotherapy treatment or palliative care patients) and 
more time to complete longer questionnaires. Moreover, cancer 
patients who are undergoing treatment may require immediate 
psychological support, whereas cancer survivors may need to adapt 
to the disease in the long term. For the latter patients, a more 
extensive psychological assessment seems to be needed. 

Although single-item interviews may have a useful role in 
assessing distress in palliative care patients by minimizing patient 
burden, it is also true that somewhat longer scales may have higher 
content validity and may be better suited for longitudinal assess- 
ments. Future research should compare the accuracy and appropri- 
ateness of tools of differing lengths in specific treatment settings. 

Choosing a tool for routine screening of cancer patients requires 
a trade-off between a measure with adequate psychometric prop- 
erties and one with a reasonable length. It has been shown that 
computerized versions of screening instruments that use touch 
screen technology can be used successfully, including by older 
patients (155). The use of fully computerized touch screen and 
autoscoring technology minimizes the workload of oncology treat- 
ment personnel, further reduces costs, and ensures the continuity 
and standardization of its application. 

The usefulness of a screening program for emotional distress 
can be evaluated according to whether or not screened patients 
accept referral to a mental health professional. Shimizu et al. (156) 
found that neither patient demographic variables nor the level of 
physical functioning, disease stage, or treatment status was associ- 
ated with acceptance of a referral by the patient, whereas level of 
distress was, thus providing evidence that screening for emotional 
distress can result in enhanced utilization of psychological treat- 
ment. Compared with structured clinical interviews, distress 
screening instruments tend to overestimate the prevalence rates of 
depressive disorders in cancer patients (116). In this regard, 
measures that have superior psychometric properties may, there- 
fore, reduce the workload of psycho-oncology staff and allow for 
the accurate forecasting of resource needs. When clinic staff, alone 
or in cooperation with researchers, want to undertake distress 
tracking over time to assess treatment outcomes and/or learn more 
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about adjustment processes longitudinally, then ultrashort 
screening tools tend to fall short because they lack a range of 
scores. Only the longer versions of measures can accomplish such 
objectives. 

Several limitations of this systematic review must be noted. 
Some validation studies or measures could have been overlooked 
because of the fact that only peer-reviewed articles were included 
in this review. On the other hand, the scientific accuracy of such 
studies or measures would have remained unclear because of their 
lack of peer review. Furthermore, we only included validation 
studies that provided information on construct validity, discrimi- 
nant validity, and/or concurrent validity for at least one additional 
measure, and we excluded feasibility studies that only reported on 
the measure itself or on a translation of the measure. Many studies 
that were included only reported on limited aspects of validation. 
Of these, several described results of factor analyses, as well as 
subscale and total scale reliabilities, whereas others provided data 
from ROC analyses without information on reliability. Also, many 
included studies did not provide sufficient descriptive statistics to 
allow us to compute missing indices of sensitivity, specificity, pos- 
itive predictive values, and negative predictive values. Consequently, 
the conclusions we draw in this review depend on the information 
given in the original reports. However, the strength of a systematic 
review is that it provides a broader scope than meta-analyses, 
which typically combine studies of varying types and consequently 
provide only summary statistics. Hence, this systematic review 
is, to our knowledge, the most comprehensive review to date 
that addresses a broad range of screening tools, varying types of 
cancers, and disease stages. 

In conclusion, several generic and newly developed cancer- 
specific instruments meet high-quality criteria for use in emotional 
distress screening of cancer patients. Many general emotional dis- 
tress screening tools focus on depression. Nonetheless, highly 
prevalent transient anxiety or mixed emotional disorders that 
occur during the cancer diagnosis and treatment trajectory deserve 
the attention of clinicians. Hence, the exclusive use of a depression 
scale may overlook other disorders (eg, anxiety disorders). 
Consequently, a scale that measures mixed emotional states rather 
than depression only has clear merit for clinical practice. 

Apart from purely psychometric considerations, large-scale 
implementation of screening for emotional distress may not occur 
if a given test has to be purchased for each use. This factor alone 
may have an impact on the choice of a screening tool, given that 
some well-validated screening tools have to be purchased for every 
use, whereas others are available at no cost. Another useful crite- 
rion for deciding which tool to use is the treatment setting. For 
example, treatment centers that specialize in breast or prostate 
cancer may prefer to use disease-specific measures. 

In terms of actual decision making, it is important to recognize 
that a measure's sensitivity and specificity are a function of the cut- 
off that is used to distinguish anxious or depressed patients from 
nonanxious or nondepressed patients. Higher cutoffs improve the 
measure's specificity, and treatment facilities can decide upfront, 
by consciously choosing a specific cutoff, the amount of psycho- 
logical and psychotherapeutic follow-up treatment they are willing 
to or can provide. Given that we were able to find a large number 
of well-executed validation studies on distress screening tools, we 
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question whether the development of additional tools at this time 
should be discouraged to avoid redundancy. However, it may be 
worthwhile to initiate additional attempts to improve the validity 
of work on the tools that have good psychometric properties but 
that have not yet been validated against criterion standards. 

Worthy of note is an ongoing National Institutes of Health 
project — the Patient-Reported Outcomes Measurement Information 
System network (http://www.nihpromis.org/default.aspx) — to 
improve measures of patient-reported outcomes. A number of 
tools for the assessment of emotional distress in patients with 
chronic diseases are in the process of being developed within this 
network that may be useful as potential screening tools for emo- 
tional distress in cancer patients in the future. 

Empirical findings published to date do not allow us to judge 
the predictive validity of screening tools for emotional distress. 
Nonetheless, the screening tools recommended here are effective 
for routine screening of emotional distress based on their high 
sensitivity and specificity. However, further information is needed 
about how screening affects long-term outcomes and patient 
quality of life. 
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